Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 21

1	Speech technology for unwritten languages
	Scharenborg, Odette; Besacier, Laurent; Black, Alan...
	In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
	BASE
	Show details

2	Controlling Utterance Length in NMT-based Word Segmentation with Attention
	Godard, Pierre; Besacier, Laurent; Yvon, François
	In: International Workshop on Spoken Language Translation ; https://hal.archives-ouvertes.fr/hal-02343206 ; International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China (2019)
	BASE
	Show details

3	Unsupervised word discovery for computational language documentation ; Découverte non-supervisée de mots pour outiller la linguistique de terrain
	Godard, Pierre. - : HAL CCSD, 2019
	In: https://tel.archives-ouvertes.fr/tel-02286425 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COmUE), 2019. English. ⟨NNT : 2019SACLS062⟩ (2019)
	Abstract: Language diversity is under considerable pressure: half of the world’s languages could disappear by the end of this century. This realization has sparked many initiatives in documentary linguistics in the past two decades, and 2019 has been proclaimed the International Year of Indigenous Languages by the United Nations, to raise public awareness of the issue and foster initiatives for language documentation and preservation. Yet documentation and preservation are time-consuming processes, and the supply of field linguists is limited. Consequently, the emerging field of computational language documentation (CLD) seeks to assist linguists in providing them with automatic processing tools. The Breaking the Unwritten Language Barrier (BULB) project, for instance, constitutes one of the efforts defining this new field, bringing together linguists and computer scientists. This thesis examines the particular problem of discovering words in an unsegmented stream of characters, or phonemes, transcribed from speech in a very-low-resource setting. This primarily involves a segmentation procedure, which can also be paired with an alignment procedure when a translation is available. Using two realistic Bantu corpora for language documentation, one in Mboshi (Republic of the Congo) and the other in Myene (Gabon), we benchmark various monolingual and bilingual unsupervised word discovery methods. We then show that using expert knowledge in the Adaptor Grammar framework can vastly improve segmentation results, and we indicate ways to use this framework as a decision tool for the linguist. We also propose a tonal variant for a strong nonparametric Bayesian segmentation algorithm, making use of a modified backoff scheme designed to capture tonal structure. To leverage the weak supervision given by a translation, we finally propose and extend an attention-based neural segmentation method, improving significantly the segmentation performance of an existing bilingual method. ; La diversité linguistique est actuellement menacée : la moitié des langues connues dans le monde pourraient disparaître d'ici la fin du siècle. Cette prise de conscience a inspiré de nombreuses initiatives dans le domaine de la linguistique documentaire au cours des deux dernières décennies, et 2019 a été proclamée Année internationale des langues autochtones par les Nations Unies, pour sensibiliser le public à cette question et encourager les initiatives de documentation et de préservation. Néanmoins, ce travail est coûteux en temps, et le nombre de linguistes de terrain, limité. Par conséquent, le domaine émergent de la documentation linguistique computationnelle (CLD) vise à favoriser le travail des linguistes à l'aide d'outils de traitement automatique. Le projet Breaking the Unwritten Language Barrier (BULB), par exemple, constitue l'un des efforts qui définissent ce nouveau domaine, et réunit des linguistes et des informaticiens. Cette thèse examine le problème particulier de la découverte de mots dans un flot non segmenté de caractères, ou de phonèmes, transcrits à partir du signal de parole dans un contexte de langues très peu dotées. Il s'agit principalement d'une procédure de segmentation, qui peut également être couplée à une procédure d'alignement lorsqu'une traduction est disponible. En utilisant deux corpus en langues bantoues correspondant à un scénario réaliste pour la linguistique documentaire, l'un en Mboshi (République du Congo) et l'autre en Myene (Gabon), nous comparons diverses méthodes monolingues et bilingues de découverte de mots sans supervision. Nous montrons ensuite que l'utilisation de connaissances linguistiques expertes au sein du formalisme des Adaptor Grammars peut grandement améliorer les résultats de la segmentation, et nous indiquons également des façons d'utiliser ce formalisme comme outil de décision pour le linguiste. Nous proposons aussi une variante tonale pour un algorithme de segmentation bayésien non-paramétrique, qui utilise un schéma de repli modifié pour capturer la structure tonale. Pour tirer parti de la supervision faible d'une traduction, nous proposons et étendons, enfin, une méthode de segmentation neuronale basée sur l'attention, et améliorons significativement la performance d'une méthode bilingue existante.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Alignement bilingue; Apprentissage non-supervisé; Automatic word segmentation; Bayesian models; Bilingual alignment; Langues peu dotées; Low-resource languages; Modèles bayésiens; Segmentation automatique en mots; Unsupervised learning
	URL: https://tel.archives-ouvertes.fr/tel-02286425/file/75551_GODARD_2019_archivage.pdf https://tel.archives-ouvertes.fr/tel-02286425 https://tel.archives-ouvertes.fr/tel-02286425/document
	BASE
	Hide details

4	Controlling Utterance Length in NMT-based Word Segmentation with Attention ...
	Godard, Pierre; Besacier, Laurent; Yvon, Francois. - : arXiv, 2019
	BASE
	Show details

5	Controlling Utterance Length in NMT-based Word Segmentation with Attention ...
	Godard, Pierre; Besacier, Laurent; Yvon, François. - : Zenodo, 2019
	BASE
	Show details

6	Controlling Utterance Length in NMT-based Word Segmentation with Attention ...
	Godard, Pierre; Besacier, Laurent; Yvon, François. - : Zenodo, 2019
	BASE
	Show details

7	Unsupervised Word Segmentation from Speech with Attention
	Godard, Pierre; Zanon Boito, Marcely; Ondel, Lucas...
	In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01818092 ; Interspeech 2018, Sep 2018, Hyderabad, India (2018)
	BASE
	Show details

8	Bayesian models for unit discovery on a very low resource language
	Ondel, Lucas; Godard, Pierre; Besacier, Laurent...
	In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ; https://hal.archives-ouvertes.fr/hal-01709589 ; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada (2018)
	BASE
	Show details

9	Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “Speaking rosetta” JSALT 2017 workshop
	Scharenborg, Odette; Besacier, Laurent; Black, Alan...
	In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01709578 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
	BASE
	Show details

10	Unsupervised Word Segmentation: does tone matter ?
	Godard, Pierre; Löser, Kevin; Allauzen, Alexandre...
	In: International Conference on Intelligent Text Processing and Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-01910756 ; International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam (2018)
	BASE
	Show details

11	Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
	Godard, Pierre; Besacier, Laurent; Yvon, François...
	In: Workshop on Computational Research in Phonetics, Phonology, and Morphology ; https://hal.archives-ouvertes.fr/hal-01910757 ; Workshop on Computational Research in Phonetics, Phonology, and Morphology, Oct 2018, Bruxelles, Belgium. pp.32 - 42, ⟨10.18653/v1/P17⟩ (2018)
	BASE
	Show details

12	Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
	Rialland, Annie; Adda-Decker, Martine; Kouarata, Guy-Noël...
	In: 11th edition of the Language Resources and Evaluation Conference (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01710043 ; 11th edition of the Language Resources and Evaluation Conference (LREC 2018), ELRA, May 2018, Miyazaki, Japan (2018)
	BASE
	Show details

13	Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop ...
	Scharenborg, Odette; Besacier, Laurent; Black, Alan. - : arXiv, 2018
	BASE
	Show details

14	Unsupervised Word Segmentation from Speech with Attention ...
	Godard, Pierre; Zanon-Boito, Marcely; Ondel, Lucas. - : arXiv, 2018
	BASE
	Show details

15	BULB: Breaking the Unwritten Language Barrier
	Adda, Gilles; Adda-Decker, Martine; Ambouroue, Odette...
	In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
	BASE
	Show details

16	Breaking the unwritten language barrier: the BULB project
	Adda, Gilles; Stüker, Sebastian; Adda-Decker, Martine...
	In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
	BASE
	Show details

17	Preliminary Experiments on Unsupervised Word Discovery in Mboshi
	Godard, Pierre; Adda, Gilles; Adda-Decker, Martine...
	In: Interspeech 2016 proceedings ; Interspeech 2016 ; https://hal.archives-ouvertes.fr/hal-01350119 ; Interspeech 2016, Sep 2016, San-Francisco, United States (2016)
	BASE
	Show details

18	Innovative technologies for under-resourced language documentation: The BULB Project
	Adda, Gilles; Adda-Decker, Martine; Ambouroue, Odette...
	In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
	BASE
	Show details

19	Breaking the unwritten language barrier: the BULB project
	Adda, Gilles; Stüker, Sebastian; Adda-Decker, Martine...
	In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
	BASE
	Show details

20	Innovative technologies for under-resourced language documentation: The BULB Project
	Lamel, Lori; Makasso, Emmanuel-Moselly; Rialland, Annie...
	In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern